中国科学技术信息研究所--国家工程技术数字图书馆

1. Student cluster competition 2017, team NTHU: Reproducing vectorization of the tersoff multi-body potential on the Intel Skylake and Nvidia P100 architecture

[机翻] 2017年学生集群竞赛，NTHU团队：在Intel Skylake和Nvidia P100架构上复制tersoff多体势矢量化

[期刊] Chang, ChanJung Lin, YungChing Cheng, YuHsuan Wang, YuCheng Yu, LiYu Yang, TienChi Chou, Jerry 《Parallel Computing》 2018年78卷Oct.期共7页

摘要 : Markus Hohnerbach et al. recently published a work to optimize the performance of Tersoff potential, which is a computing scheme used in the LAMMPS molecular dynamics (MD) code. The optimization solver was implemented with three d... 展开

2. Student Cluster Competition 2018, Team Tsinghua University: Reproducing performance of multi-physics simulations of the Tsunamigenic 2004 Sumatra megathrust earthquake on the Intel Skylake Architecture

[机翻] 2018年学生集群竞赛，清华大学团队：英特尔天湖架构上2004年苏门答腊海啸大推力地震的多物理模拟再现性能

[期刊] Lin, ShaoFu Yang, ChiChen Cheng, YuHsuan Hsu, KengJui Chen, HungHsin Lin, YuanChing Chou, Jerry 《Parallel Computing》 2019年90卷Dec.期共6页

摘要 : As a special activity of the Student Cluster Competition at SC18 conference, we made an attempt to reproduce the performance evaluations of an optimized version of earthquake simulation software SeisSol. Our experiments were condu... 展开

关键词 : Reproducibility Earthquake simulation Scalability Performance

3. Support NNEF execution model for NNAPI

[期刊] Chang, Yuan-Ming Sung, Chia-Yu Sheu, Yu-Chien Yu, Meng-Shiun Hsu, Min-Yih Lee, Jenq-Kuen 《Journal of supercomputing》 2021年77卷9期共32页

摘要 : With growing applications such as image recognition, speech recognition, ADAS, and AIoT, artificial intelligence (AI) frameworks are becoming popular in various industries. Currently, many choices for neural network frameworks exi... 展开

关键词 : AI model compilers NNEF NNAPI

4. ESSA: An energy-Aware bit-Serial streaming deep convolutional neural network accelerator

[期刊] Hsu, Lien-Chih Chiu, Ching-Te Lin, Kuan-Ting Chou, Hsing-Huan Pu, Yen-Yu 《Journal of systems architecture》 2020年111卷共18页

摘要 : Over the past decade, deep convolutional neural networks (CNN) have been widely embraced in various visual recognition applications owing to their extraordinary accuracy. However, their high computational complexity and excessive ... 展开

关键词 : Convolutional neural networks (CNNs) Hardware accelerator Energy-aware Precision Bit-Serial PE Streaming dataflow

原文获取

5. The Design and Experiments of A SID-Based Power-Aware Simulator for Embedded Multicore Systems

[期刊] Lin, Cheng-Yen Kuan, Chi-Bang Huang, Shi-Yu Lee, Jenq-Kuen Huang, Chung-Wen 《ACM Transactions on Design Automation of Electronic Systems》 2015年20卷2期共27页

摘要 : Embedded multicore systems are playing increasingly important roles in the design of consumer electronics. The objective of such systems is to optimize both performance and power characteristics of mobile devices. However, current... 展开 Embedded multicore systems are playing increasingly important roles in the design of consumer electronics. The objective of such systems is to optimize both performance and power characteristics of mobile devices. However, currently there are no power metrics supporting popular application design platforms (such as SID) that application developers use to develop their applications. This hinders the ability of application developers to optimize power consumption. In this article we present the design and experiments of a SIDbased power-aware simulation framework for embedded multicore systems. The proposed power estimation flow includes two phases: IP-level power modeling and power-aware system simulation. The first phase employs PowerMixer(IP) to construct the power model for the processor IP and other major IPs, while the second phase involves a power abstract interpretation method for summarizing the simulation trace, then, with a CPE module, estimating the power consumption based on the summarized trace information and the input of IP power models. In addition, a Manager component is devised to map each digital signal processor (DSP) component to a host thread and maintain the access to shared resources. The aim is to maintain the simulation performance as the number of simulated DSP components increases. A power-profiling API is also supported that developers of embedded software can use to tune the granularity of power-profiling for a specific code section of the target application. We demonstrate via case studies and experiments how application developers can use our SID-based power simulator for optimizing the power consumption of their applications. We characterize the power consumption of DSP applications with the DSPstone benchmark and discuss how compiler optimization levels with SIMD intrinsics influence the performance and power consumption. A histogram application and an augmented-reality application based on human-face-based RMS(recognition, mining, and synthesis) application are deployed as running examples on multicore systems to demonstrate how our power simulator can be used by developers in the optimization process to illustrate different views of power dissipations of applications. 收起

关键词 : Design Experimentation Multicore simulation power modeling embedded processor DSP Design Experimentation

原文获取

6. NNBIocks: a Blockly framework for Al computing

[期刊] Chen, Tai-Liang Chen, Yi-Ru Yu, Meng-Shiun Lee, Jenq-Kuen 《Journal of supercomputing》 2021年77卷8期共31页

摘要 : Deep learning compiler tool, Tensor Virtual Machine (TVM), has excellent deployment, compilation, and optimization capabilities supported by the industry following the vigorous growth in neural networks (NN). It has a unified inte... 展开

关键词 : Blockly Visualization Visual block Scheduling optimization Intermediate representation Neural network

7. Compilers for Low Power with Design Patterns on Embedded Multicore Systems

[机翻] 嵌入式多核系统设计模式的低功耗编译器

[期刊] Lin, Cheng-Yen Kuan, Chi-Bang Shih, Wen-Li Lee, Jenq Kuen 《Journal of signal processing systems for signal, image, and video technology》 2015年80卷3期共17页

摘要 : Minimization of power dissipation can be considered at algorithmic, compiler, architectural, logic, and circuit level. Recent research trends for multicore programming models have come to the direction that parallel design pattern... 展开

关键词 : Embedded multicore system Compiler Low power Parallel design pattern

8. Support OpenCL 2.0 Compiler on LLVM for PTX Simulators

[机翻] 在LLVM上为PTX模拟器支持opencl2.0编译器

[期刊] Yang, Chun-Chieh Wang, Shao-Chung Hsu, Min-Yi Chang, Yuan-Ming Hwang, Yuan-Shin Lee, Jenq-Kuen 《Journal of signal processing systems for signal, image, and video technology》 2019年91卷3/4期共11页

摘要 : Heterogeneous systems that consist of multiple CPUs and GPUs for high-performance computing are becoming increasingly popular, and OpenCL (Open Computing Language) provides a framework for writing programs that can be executed acr... 展开

关键词 : OpenCL Gem5-gpu LLVM Libclc PTX

9. Experiment and enabled flow for GPGPU-Sim simulators with fixed-point instructions

[期刊] Lee, Chao-Lin Hsu, Min-Yih Lu, Bing-Sung Hung, Ming-Yu Lee, Jenq-Kuen 《Journal of systems architecture》 2020年111卷共7页

摘要 : Currently, GPGPU-Sim has become an important vehicle for academic architecture research. It is a cycle-accurate simulator that models the contemporary graphics processing unit. Machine learning has now been widely used in various ... 展开

关键词 : Low-power numerical GPGPU Simulator

原文获取

10. DTMFTalk: a DTMF-Based Realization of IoT Remote Control for Smart-Home Elderly Care

[期刊] Yang, Shun-Ren Yuan, Shih-Chun Lin, Yi-Chun Yang, I-Fen 《Mobile networks & applications》 2022年27卷1期共12页

摘要 : With the progress of medical science and technology and the healthy changes in eating habits, the proportion of aged population is gradually increasing. Smart-home elderly care has thus attracted a lot of research attention in the... 展开

关键词 : Dual tone multi-frequency (DTMF) Internet of things (IoT) Remote control Smart-home elderly care